Overview

Dataset statistics

Number of variables11
Number of observations99003
Missing cells0
Missing cells (%)0.0%
Duplicate rows91
Duplicate rows (%)0.1%
Total size in memory8.3 MiB
Average record size in memory88.0 B

Variable types

NUM10
CAT1

Warnings

Dataset has 91 (0.1%) duplicate rows Duplicates
mobile_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly correlated with mobile_likes_received and 1 other fieldsHigh correlation
www_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly skewed (γ1 = 112.0745682) Skewed
mobile_likes_received is highly skewed (γ1 = 107.5312999) Skewed
www_likes_received is highly skewed (γ1 = 126.257317) Skewed
friend_count has 1962 (2.0%) zeros Zeros
friendships_initiated has 2997 (3.0%) zeros Zeros
likes has 22308 (22.5%) zeros Zeros
likes_received has 24428 (24.7%) zeros Zeros
mobile_likes has 35056 (35.4%) zeros Zeros
mobile_likes_received has 30003 (30.3%) zeros Zeros
www_likes has 60999 (61.6%) zeros Zeros
www_likes_received has 36864 (37.2%) zeros Zeros

Reproduction

Analysis started2020-10-29 06:01:21.415969
Analysis finished2020-10-29 06:01:46.765522
Duration25.35 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

age
Real number (ℝ≥0)

Distinct101
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.28022383
Minimum13
Maximum113
Zeros0
Zeros (%)0.0%
Memory size773.5 KiB

Quantile statistics

Minimum13
5-th percentile15
Q120
median28
Q350
95-th percentile90
Maximum113
Range100
Interquartile range (IQR)30

Descriptive statistics

Standard deviation22.58974831
Coefficient of variation (CV)0.6059445462
Kurtosis1.561446767
Mean37.28022383
Median Absolute Deviation (MAD)10
Skewness1.415260654
Sum3690854
Variance510.2967289
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1851965.2%
 
2344044.4%
 
1943914.4%
 
2037693.8%
 
2136713.7%
 
2536413.7%
 
1732833.3%
 
1630863.1%
 
2230323.1%
 
2428272.9%
 
Other values (91)6170362.3%
 
ValueCountFrequency (%) 
134840.5%
 
1419251.9%
 
1526182.6%
 
1630863.1%
 
1732833.3%
 
ValueCountFrequency (%) 
1132020.2%
 
11218< 0.1%
 
11118< 0.1%
 
11015< 0.1%
 
1099< 0.1%
 

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size773.5 KiB
male
58749 
female
40254 
ValueCountFrequency (%) 
male5874959.3%
 
female4025440.7%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.813187479
Min length4

tenure
Real number (ℝ≥0)

Distinct2426
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean537.8848318
Minimum0
Maximum3139
Zeros70
Zeros (%)0.1%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile47
Q1226
median412
Q3675
95-th percentile1575
Maximum3139
Range3139
Interquartile range (IQR)449

Descriptive statistics

Standard deviation457.645601
Coefficient of variation (CV)0.8508245147
Kurtosis2.199181661
Mean537.8848318
Median Absolute Deviation (MAD)213
Skewness1.535709166
Sum53252212
Variance209439.4961
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3001730.2%
 
3031700.2%
 
2421640.2%
 
2721630.2%
 
2571610.2%
 
2971610.2%
 
2851600.2%
 
2801600.2%
 
2841580.2%
 
2781580.2%
 
Other values (2416)9737598.4%
 
ValueCountFrequency (%) 
0700.1%
 
1600.1%
 
2720.1%
 
3790.1%
 
4860.1%
 
ValueCountFrequency (%) 
31393< 0.1%
 
31291< 0.1%
 
31281< 0.1%
 
31011< 0.1%
 
30191< 0.1%
 

friend_count
Real number (ℝ≥0)

ZEROS

Distinct2562
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean196.3507873
Minimum0
Maximum4923
Zeros1962
Zeros (%)2.0%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile3
Q131
median82
Q3206
95-th percentile720
Maximum4923
Range4923
Interquartile range (IQR)175

Descriptive statistics

Standard deviation387.304229
Coefficient of variation (CV)1.972511719
Kurtosis50.09427289
Mean196.3507873
Median Absolute Deviation (MAD)64
Skewness6.059008484
Sum19439317
Variance150004.5658
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
019622.0%
 
118161.8%
 
211171.1%
 
38600.9%
 
57890.8%
 
47490.8%
 
107370.7%
 
247320.7%
 
67200.7%
 
297190.7%
 
Other values (2552)8880289.7%
 
ValueCountFrequency (%) 
019622.0%
 
118161.8%
 
211171.1%
 
38600.9%
 
47490.8%
 
ValueCountFrequency (%) 
49231< 0.1%
 
49171< 0.1%
 
48631< 0.1%
 
48451< 0.1%
 
48441< 0.1%
 

friendships_initiated
Real number (ℝ≥0)

ZEROS

Distinct1519
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107.4524711
Minimum0
Maximum4144
Zeros2997
Zeros (%)3.0%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile1
Q117
median46
Q3117
95-th percentile418
Maximum4144
Range4144
Interquartile range (IQR)100

Descriptive statistics

Standard deviation188.786951
Coefficient of variation (CV)1.756934475
Kurtosis42.53560096
Mean107.4524711
Median Absolute Deviation (MAD)36
Skewness5.150757415
Sum10638117
Variance35640.51287
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
029973.0%
 
122122.2%
 
215511.6%
 
313551.4%
 
413521.4%
 
613281.3%
 
513281.3%
 
1113191.3%
 
813141.3%
 
1312791.3%
 
Other values (1509)8296883.8%
 
ValueCountFrequency (%) 
029973.0%
 
122122.2%
 
215511.6%
 
313551.4%
 
413521.4%
 
ValueCountFrequency (%) 
41441< 0.1%
 
36541< 0.1%
 
35941< 0.1%
 
35381< 0.1%
 
34151< 0.1%
 

likes
Real number (ℝ≥0)

ZEROS

Distinct2924
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean156.0787855
Minimum0
Maximum25111
Zeros22308
Zeros (%)22.5%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median11
Q381
95-th percentile726
Maximum25111
Range25111
Interquartile range (IQR)80

Descriptive statistics

Standard deviation572.2806808
Coefficient of variation (CV)3.666614134
Kurtosis200.4456878
Mean156.0787855
Median Absolute Deviation (MAD)11
Skewness11.02370356
Sum15452268
Variance327505.1777
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02230822.5%
 
169287.0%
 
244344.5%
 
332403.3%
 
425072.5%
 
520272.0%
 
618061.8%
 
716181.6%
 
814301.4%
 
913811.4%
 
Other values (2914)5132451.8%
 
ValueCountFrequency (%) 
02230822.5%
 
169287.0%
 
244344.5%
 
332403.3%
 
425072.5%
 
ValueCountFrequency (%) 
251111< 0.1%
 
216521< 0.1%
 
167321< 0.1%
 
165831< 0.1%
 
147991< 0.1%
 

likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2681
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean142.6893629
Minimum0
Maximum261197
Zeros24428
Zeros (%)24.7%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median8
Q359
95-th percentile561
Maximum261197
Range261197
Interquartile range (IQR)58

Descriptive statistics

Standard deviation1387.919613
Coefficient of variation (CV)9.726861091
Kurtosis17384.94
Mean142.6893629
Median Absolute Deviation (MAD)8
Skewness112.0745682
Sum14126675
Variance1926320.851
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02442824.7%
 
173057.4%
 
245414.6%
 
333473.4%
 
426692.7%
 
523732.4%
 
618731.9%
 
716801.7%
 
815381.6%
 
913511.4%
 
Other values (2671)4789848.4%
 
ValueCountFrequency (%) 
02442824.7%
 
173057.4%
 
245414.6%
 
333473.4%
 
426692.7%
 
ValueCountFrequency (%) 
2611971< 0.1%
 
1781661< 0.1%
 
1520141< 0.1%
 
1060251< 0.1%
 
826231< 0.1%
 

mobile_likes
Real number (ℝ≥0)

ZEROS

Distinct2396
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.1162995
Minimum0
Maximum25111
Zeros35056
Zeros (%)35.4%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q346
95-th percentile481.9
Maximum25111
Range25111
Interquartile range (IQR)46

Descriptive statistics

Standard deviation445.2529851
Coefficient of variation (CV)4.195896268
Kurtosis360.9885806
Mean106.1162995
Median Absolute Deviation (MAD)4
Skewness14.16123656
Sum10505832
Variance198250.2207
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03505635.4%
 
162976.4%
 
239414.0%
 
329172.9%
 
422652.3%
 
517941.8%
 
615981.6%
 
713951.4%
 
812121.2%
 
911491.2%
 
Other values (2386)4137941.8%
 
ValueCountFrequency (%) 
03505635.4%
 
162976.4%
 
239414.0%
 
329172.9%
 
422652.3%
 
ValueCountFrequency (%) 
251111< 0.1%
 
216521< 0.1%
 
167321< 0.1%
 
140391< 0.1%
 
135291< 0.1%
 

mobile_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2004
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.1204913
Minimum0
Maximum138561
Zeros30003
Zeros (%)30.3%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q333
95-th percentile317
Maximum138561
Range138561
Interquartile range (IQR)33

Descriptive statistics

Standard deviation839.8894437
Coefficient of variation (CV)9.984362083
Kurtosis15522.64932
Mean84.1204913
Median Absolute Deviation (MAD)4
Skewness107.5312999
Sum8328181
Variance705414.2777
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03000330.3%
 
182438.3%
 
249485.0%
 
336083.6%
 
429443.0%
 
523832.4%
 
620222.0%
 
717451.8%
 
815211.5%
 
914371.5%
 
Other values (1994)4014940.6%
 
ValueCountFrequency (%) 
03000330.3%
 
182438.3%
 
249485.0%
 
336083.6%
 
429443.0%
 
ValueCountFrequency (%) 
1385611< 0.1%
 
1312441< 0.1%
 
899111< 0.1%
 
733331< 0.1%
 
434101< 0.1%
 

www_likes
Real number (ℝ≥0)

ZEROS

Distinct1726
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.96242538
Minimum0
Maximum14865
Zeros60999
Zeros (%)61.6%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q37
95-th percentile208
Maximum14865
Range14865
Interquartile range (IQR)7

Descriptive statistics

Standard deviation285.5601519
Coefficient of variation (CV)5.715498191
Kurtosis449.1484832
Mean49.96242538
Median Absolute Deviation (MAD)0
Skewness16.91102529
Sum4946430
Variance81544.60033
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
06099961.6%
 
146974.7%
 
227602.8%
 
319482.0%
 
414191.4%
 
512021.2%
 
610811.1%
 
78970.9%
 
87920.8%
 
97570.8%
 
Other values (1716)2245122.7%
 
ValueCountFrequency (%) 
06099961.6%
 
146974.7%
 
227602.8%
 
319482.0%
 
414191.4%
 
ValueCountFrequency (%) 
148651< 0.1%
 
129031< 0.1%
 
110771< 0.1%
 
107631< 0.1%
 
106271< 0.1%
 

www_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1636
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.56883125
Minimum0
Maximum129953
Zeros36864
Zeros (%)37.2%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q320
95-th percentile227
Maximum129953
Range129953
Interquartile range (IQR)20

Descriptive statistics

Standard deviation601.416348
Coefficient of variation (CV)10.26853934
Kurtosis23812.2491
Mean58.56883125
Median Absolute Deviation (MAD)2
Skewness126.257317
Sum5798490
Variance361701.6237
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03686437.2%
 
185138.6%
 
251115.2%
 
335863.6%
 
428282.9%
 
523172.3%
 
619181.9%
 
716021.6%
 
814451.5%
 
913731.4%
 
Other values (1626)3344633.8%
 
ValueCountFrequency (%) 
03686437.2%
 
185138.6%
 
251115.2%
 
335863.6%
 
428282.9%
 
ValueCountFrequency (%) 
1299531< 0.1%
 
621031< 0.1%
 
396051< 0.1%
 
392131< 0.1%
 
340391< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

agegendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
014male266.000000000
114female6.000000000
214male13.000000000
314female93.000000000
414male82.000000000
514male15.000000000
613male12.000000000
713female0.000000000
813male81.000000000
913male171.000000000

Last rows

agegendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
9899319male394.04538414445011508844355961669127
9899420female402.01988332735110602572487333310332692
9899520female699.03611973450777684414690993859
9899624female182.0293812726018177655843117081756057
9899728female290.022181618462610268429042503366018
9899868female541.021183413996180893505118874916202
9899918female21.01968172044011341243991059222820
9900015female111.0200215241195912554119591146201092
9900123female416.0256018545066516450657600756
9900239female397.020497689410124439410953002913

Duplicate rows

Most frequent

agegendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_receivedcount
5025male21.0000000006
3122male24.0000000004
3223female0.0000000004
214male0.0000000003
1317male14.0000000003
2118male13.0000000003
2519male0.0000000003
3323male0.0000000003
4625male0.0000000003
4725male2.0000000003